Before analysis, users should consider conducting finer-scale
filtering in order to clean the NestWatch dataset after running
nw.cleandata. This may include selecting certain species,
identifying specific nest phenology dates (i.e., incubation should not
last longer than X days for species Y), or limiting nest attempts to a
certain geographic area.
Limiting the dataset to just a few species can easily be done using
the pipe (%>%). If you are unfamiliar with “piping”, see
the migritrr package. Below we will subset the version 2 of
the NestWatch dataset to include only attempts for Bewick’s Wren
(“bewwre”) and Carolina Wren (“carwre”). The code below walks through
download, merging, and filtering to species. But this data subset is
also included in the package for quick access using
wrens <- nestwatchR::wren_quickstart:
# Download and merge datasets
nw.getdata(version = 2)
nw.mergedata(attempts = NW.attemps, checks = NW.checks, output = "merged.data")
# Filter data to include only carwre and bewwre
wrens <- merged.data %>% filter(Species.Code %in% c("carwre", "bewwre"))
# View what species are in the new dataset
unique(wrens$Species.Name)
> [1] "Carolina Wren" "Bewick's Wren"
# Subset dataframe to get just a few columns of interest
wrens <- wrens %>% select(Attempt.ID, Species.Name, Species.Code, Year,
Subnational.Code, Latitude, Longitude)
wrens <- wrens %>% distinct() # Removes duplicate rows (representing individual visits)Spatial filters are a flexible way to limit data to a predefined
geographic area. You may choose to limit an analysis to nesting attempts
within a certain area, like a single Bird
Conservation Region or a select number of states. Or one may choose
to clean potentially misidentified species by using a range map to
filter out nesting attempts. If those filtering criteria are easily
subset from the dataset, like states and countries (via
Subnational.Code), you can quickly use subsetting rules to
filter their data for analysis. But, if those criteria are not already
easily subsettable, a spatial filter can be a good option.
As an example, we can first view a plot of where the nests in
wrens are located by species. Here we will use
tmap to produce an interactive map. We will also be
utilizing the sf package to help create and transform our
tabular data into spatial data. We will then project the wrens data into
the Lambert Conformal Conic Projection, which is well suited for mapping
areas in the United States (but you can change the object
prj to any appropriate PROJ.4 string for the area you are
mapping).
[!Note] If you are unfamiliar with working with spatial data, this is a good resource on coordinate reference systems and projections within R.
# Create a spatial object from nest data
nest_points <- sf::st_as_sf(wrens, coords = c("Longitude", "Latitude"), crs = 4326) # data are in WGS 84 (crs = 4326)
# Define desired CRS for data projection
proj <- "+proj=lcc +lon_0=-90 +lat_1=33 +lat_2=45" # PROJ.4 string defining the projection
# Project the nest points into LCC projection
nest_points <- sf::st_transform(nest_points, crs = proj) # apply projection
# Map nest locations
library(tmap)
tmap_mode("view") # starts interactive plot
map <- tm_basemap("Esri.WorldGrayCanvas") + # define basemap
tm_shape(nest_points) + # add nest point data
tm_dots(col = "Species.Name") # color nests by species
# View the map
map By looking at this map, we can see that there are several suspicious nests identified as Bewick’s Wrens in the eastern US as other outliers. Bewick’s Wrens are not typically recorded east of the Mississippi River, so some of these records could be misidentifications. We could decide on a subset of states/provinces to filter out-of-range nest attempts, but a better method might be to filter nest locations based on a range map.
The eBird
Status and Trends Products contain a wealth of information on bird
populations. Among the available products are range maps of species for
which Status and Trend Models have been run. These data are easily
accessible in R through the ebirdst package. To access
these eBird data, you will need to acquire a free access key. This key
will give you access to Status and Trends Data within R. For more
information and to acquire an access key, see the documentation here.
We can use our unique access key to download the range map of
Bewick’s Wren and Carolina Wren. Note, you will need the species codes
of those species you would like to download, not their alpha code or
common name. By modifying the access key, species, and download location
in the code below, you can download and open the range polygons to your
global environment. This code selects only the breeding range layer if
available, and if unavailable then selects the resident range layer.
Note: You only need to input your access key once (R will store it for
you!) and you only need to download the range maps once (you may get an
error if you rerun ebirdst_download_status() when the data
already exists in the spatialdata_path location). You may
also need to modify the year as noted in the code below.
# Obtain and set an ebird access key
set_ebirdst_access_key("pasteyourkeyhere") # you only need to do this once, R will remember it
# Define what species you want to download by their code
spp <- c("bewwre", "carwre")
# Specify where the data will be downloaded
# Here we will create a folder "spatial" in our working directory:
spatialdata_path <- c("spatial")
# Download range maps by species
for (i in spp) {
ebirdst_download_status(species = i, download_abundance = FALSE,
download_ranges = TRUE, pattern = "_smooth_27km_",
path = spatialdata_path)
}
# You may need to modify the year below to reflect the appropriate eBird product that downloaded
# Read in the range files
for (i in spp) {
# Generate the path to the .gpkg files
file_path <- paste0(spatialdata_path, "/2022/", i, "/ranges/", i, "_range_smooth_27km_2022.gpkg")
# Read in the .gpkg file
range_data <- st_read(file_path)
# Generate the name for the object
object_name <- paste0(i, "_range")
# Assign the value to the dynamically-generated object name
assign(object_name, range_data)
rm(range_data)
}
# Select just breeding layer if available, else resident layer
object_names <- paste(spp, "range", sep = "_")
for (i in object_names) {
if (i %in% ls(envir = .GlobalEnv)) {
data <- get(i, envir = .GlobalEnv)
if (any(data$season %in% "breeding")) {
data <- data %>% filter(season == "breeding")
data <- data %>% st_transform(nest_points, crs = prj)
assign(paste0(i), data, envir = .GlobalEnv)
} else {
data <- data %>% filter(season == "resident")
data <- data %>% st_transform(nest_points, crs = prj)
assign(paste0(i), data, envir = .GlobalEnv)}
rm(data)
}
}
# Clean up intermediate objects
rm(file_path, i, object_name, object_names, spatialdata_path)Now that we have range polygons for Bewick’s and Carolina Wrens, we can add them to our map and investigate our nest locations a bit further. Let’s plot just the Bewick’s Wren data.
# Subset nest locations to Bewick's Wrens
bewwre <- nest_points %>% filter(Species.Code == "bewwre")
# Map the nests onto the range polygon
tmap_mode("view") # starts interactive plot
map <- tm_basemap("Esri.WorldGrayCanvas") + # define basemap
tm_shape(shp = bewwre_range, name = "Bewick's Wren") + # add range polygon, define color
tm_polygons(alpha = 0.5, col = "#a1d1cbff") +
tm_shape(bewwre) + tm_dots(col = "Species.Name") # add nest points
mapWe can now see that there are more than a few nests outside of the
typical Bewick’s Wren range. But a few of these nests are also close to
the range border and may truly belong to a Bewick’s Wren. We
can use nw.filterspatial to help us identify and/or remove
nest attempts outside of the range polygon (or any other shapefile you
may want to filter by).
nw.filterspatial requires the input of sf
objects for points = and polygon =,
representing the nest points to be filtered and the shapefile by which
they are filtered, respectively. The mode = argument is
used to define if points identified outside the polygon should be
flagged for review (“flagged”) or removed from the dataset (“remove”).
This function also has an optional buffer argument buffer =
where the user may define a distance outside the polygon for which nest
locations will be allowed. This distance can be either in kilometers or
miles and should be defined using buffer_units = "km" or
= "mi". The resulting buffer polygon may be optionally
exported to the global environment for saving or plotting using the
logical buffer_output = T. The user may also define their
desired projection using proj = and inputting a PROJ.4
string; if not provided the function will default to the Lambert
Conformal Conic which is well suited for plotting the majority of
NestWatch data. Finally, the optional output = argument can
be used to name the resulting spatially-cleaned spatial dataframe.
If we zoom in to central Colorado, we can see there are a few Bewick’s Wren nests just outside the range border. We might choose to keep nests like these in our analysis, because they could be correctly identified and just a bit outside the typical range. So, we can define a buffer zone to keep such nests but exclude those well outside the expected range:
nw.filterspatial(points = bewwre, # Bewick's Wren nest points
polygon = bewwre_range, # Bewick's Wren range shapefile
mode = "flag", # flag points outside
buffer = 50, # add a 50km buffer zone
buffer_units = "km", # units = km
buffer_output = T, # yes, output the buffer polygon
proj = "+proj=lcc +lon_0=-90 +lat_1=33 +lat_2=45", # LCC from above
output = "flagged_nests") # define the output nameWe can plot the results to see which points were flagged for review (and would be removed if mode was “remove”):
# Relabel nests within range for nice map symbology
flagged_nests$Flagged.Location[is.na(flagged_nests$Flagged.Location)] <- "In-Range"
map <- tm_basemap("Esri.WorldGrayCanvas") + # define basemap
tm_shape(shp = polygon_buffered, name = "50km Buffer") + # add buffered polygon, define color
tm_polygons(alpha = 0.5, col = "lightgoldenrod2") +
tm_shape(shp = bewwre_range, name = "Bewick's Wren Range") + # add range polygon, define color
tm_polygons(alpha = 0.5, col = "#a1d1cbff") +
tm_shape(flagged_nests) + # add nest points, color by "Flagged.Attempt"
tm_dots(col = "Flagged.Location",
#style = "cat",
palette = c("grey60", "green2"))
mapWe can now quickly filter out those nests where their location was flagged to be outside of our defined range.
You may also choose to refine the coarse phenologic filtering done in the cleaning phase. NestWatch data are known to have some errors where participants either enter dates incorrectly (i.e., enter the year portion of a date as 2021 in one field and 2020 in the next) or incorrectly continue a nest attempt when it should be a new nest. As an example of the latter, if a bluebird nest fails due to predation and the pair renests in the same box, this should be entered as two different attempts at the same location. But records of “run-on nests” where the first attempt was not ended do exist in the dataset. One way a user might choose to identify or remove such nesting attempts, or ones which are outside of the expected nest time frame for a given species, is to use phenologic filtering.
Phenologic filtering allows the user to define the allowed maximum
number of days for each different period in the nesting cycle, or for
the whole nesting cycle. The function nw.filterphenology()
uses both the data in the attempt summary info (data originating from
the “Attempts” dataset) and the individual visits data (originating from
the “Checks” dataset). For this function, a small user-created dataframe
in the following format is needed. These values represent the maximum
allowable number of days a nest of a particular species can be in each
nesting stage. In this example we will use NestWatch data for Eastern
Bluebird and Tree Swallow:
# You can provide data for a single species or multiple species to be run at the same time
# Create simple df with maximum allowable # days in each nest phase for Eastern Bluebird and Tree Swallow:
max_days <-
data.frame(species_code = c("easblu", "treswa"),
lay = c(7, 7), # here we chose max recorded clutch size
incubation = c(25, 25), # here means plus a bit of buffer
nestling = c(25, 30), # here means plus a bit of buffer
total = c(60, 65)) # here means plus a bit of buffer
# Download NestWatch dataset and merge
nw.getdata(version = 2)
nw.mergedata(attempts = NW.attempts, checks = NW.checks, output = "data")
# Filter data to select species
data <- data %>% filter(Species.Code == c("easblu", "treswa"))
length(unique(data$Attempt.ID))
# > [1] 256582 # number of nesting attempts for Eastern Bluebird and Tree Swallow
# Filter attempts based on nesting phenology
nw.filterphenology(data = data, sp = c("easblu", "treswa"), mode = "flag", phenology = max_days, output = "flagged_phenology")
# How many attempts were flagged?
flagged <- flagged_phenology %>% filter(Flagged.Attempt == "FLAGGED")
length(unique(flagged$Attempt.ID))
# > [1] 96552 # number of attempts which were flagged